11 research outputs found
Finite Sample Differentially Private Confidence Intervals
We study the problem of estimating finite sample confidence intervals of the mean of a normal population under the constraint of differential privacy. We consider both the known and unknown variance cases and construct differentially private algorithms to estimate confidence intervals. Crucially, our algorithms guarantee a finite sample coverage, as opposed to an asymptotic coverage. Unlike most previous differentially private algorithms, we do not require the domain of the samples to be bounded. We also prove lower bounds on the expected size of any differentially private confidence set showing that our the parameters are optimal up to polylogarithmic factors
Causal inference in transportation safety studies: Comparison of potential outcomes and causal diagrams
The research questions that motivate transportation safety studies are causal
in nature. Safety researchers typically use observational data to answer such
questions, but often without appropriate causal inference methodology. The
field of causal inference presents several modeling frameworks for probing
empirical data to assess causal relations. This paper focuses on exploring the
applicability of two such modeling frameworks---Causal Diagrams and Potential
Outcomes---for a specific transportation safety problem. The causal effects of
pavement marking retroreflectivity on safety of a road segment were estimated.
More specifically, the results based on three different implementations of
these frameworks on a real data set were compared: Inverse Propensity Score
Weighting with regression adjustment and Propensity Score Matching with
regression adjustment versus Causal Bayesian Network. The effect of increased
pavement marking retroreflectivity was generally found to reduce the
probability of target nighttime crashes. However, we found that the magnitude
of the causal effects estimated are sensitive to the method used and to the
assumptions being violated.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS440 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Statistical models for cores decomposition of an undirected random graph
The -core decomposition is a widely studied summary statistic that
describes a graph's global connectivity structure. In this paper, we move
beyond using -core decomposition as a tool to summarize a graph and propose
using -core decomposition as a tool to model random graphs. We propose using
the shell distribution vector, a way of summarizing the decomposition, as a
sufficient statistic for a family of exponential random graph models. We study
the properties and behavior of the model family, implement a Markov chain Monte
Carlo algorithm for simulating graphs from the model, implement a direct
sampler from the set of graphs with a given shell distribution, and explore the
sampling distributions of some of the commonly used complementary statistics as
good candidates for heuristic model fitting. These algorithms provide first
fundamental steps necessary for solving the following problems: parameter
estimation in this ERGM, extending the model to its Bayesian relative, and
developing a rigorous methodology for testing goodness of fit of the model and
model selection. The methods are applied to a synthetic network as well as the
well-known Sampson monks dataset.Comment: Subsection 3.1 is new: `Sample space restriction and degeneracy of
real-world networks'. Several clarifying comments have been added. Discussion
now mentions 2 additional specific open problems. Bibliography updated. 25
pages (including appendix), ~10 figure
Differentially private exponential random graphs
We propose methods to release and analyze synthetic graphs in order to protect privacy of individual relationships captured by the social network. Proposed techniques aim at fitting and estimating a wide class of exponential random graph models (ERGMs) in a differentially private manner, and thus offer rigorous privacy guarantees. More specifically, we use the randomized response mechanism to release networks under ε-edge differential privacy. To maintain utility for statistical inference, treating the original graph as missing, we propose a way to use likelihood based inference and Markov chain Monte Carlo (MCMC) techniques to fit ERGMs to the produced synthetic networks. We demonstrate the usefulness of the proposed techniques on a real data example